NetNews Offline 2

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Offline 2 / NetNews Offline Volume 2.iso / news / comp / lang / c++-part1 / 2748 < prev next >

Wrap

Text File | 1996-08-06 | 5.2 KB | 110 lines

Newsgroups: comp.lang.c++,comp.programming Path: uu4news.netcom.com!friend!news From: rich@kastle.com (Richard Krehbiel) Subject: Re: Why are 32 bit better than 16 bit pgms? Message-ID: <1996Jan19.130447.12215@friend.kastle.com> Sender: news@friend.kastle.com (News) Reply-To: rich@kastle.com Organization: Kastle Development Associates X-Newsreader: Forte Free Agent 1.0.82 References: <30FBFFE6.1FEB@netcom.com> Date: Fri, 19 Jan 1996 13:03:41 GMT "Keith S." <vain@netcom.com> wrote: >I have a simple questions: > What's are 32 bit pgms better than 16 bit programs? > Thanks! Well, I have two 16-bit contexts I can relate to: PDP-11 and Intel. The PDP-11 is truly a 16 bit CPU. It has a 64K address space, period. Going to 32 bits means being able to use more than 64K. Simple enough to understand, eh? (Well... PDP-11's usually have a protected MMU, only visible/useable by the OS, that can address up to 4M, and usually OS offers services to "bank-swap" parts of the app's memory space.) The Intel is a little tricker. Simple answer: Addressing large memory is *much* faster in 32 bits than in 16 bits, which must deal with things as if it were multiple 64K pieces. Read on for details... It was born with segment:offset addressing, where a pointer to memory has two 16 bit parts. Your instructions run faster when you only consider and manipulate the 16 bit offset part. This is called a "near pointer", when only the 16 bit offset is known and the segment part is "assumed". To increment a near pointer is one fast instruction. Of course, since a near pointer is only 16 bits, there can only be 64K of "near" memory - not enough. If you want to deal with memory objects larger than 64K you have to use a "far" pointer, which includes both the 16 bit segment and 16 bit offset. To increment a far pointer, you increment the offset until it overflows, then you change the segment. This is *far* slower. First of all you have to look for overflow from the inc, meaning a conditional branch, the kind of thing that defeats instruction prefetch and pipelines. And when I say "change the segment" I'm talking about a serious operation. First of all, the segment part in a protected mode program is an MMU table subscript. The MMU owner (the OS) defines what segment points to what memory and what kind of memory it is. You can't just inc the segment value (or add 0x100 like in real mode) to address the next consecutive memory location after an offset overflow, you have to know the OS's convention for allocating segment values. Now add to that the fact that the simple-looking 16 bit segment load operation invokes complicated processor behavior. Remembering that a segment value is really an MMU table subscript, when you load the segment register, behind the scenes the CPU checks the value against the LDT/GDT to see if it's legal, then it fetches the 8 byte MMU table entry into a segment cache, and while it does it performs some other validity checks as well. The result is that segment register loading is a SLOW instruction. My 386 reference says moving a register to another register takes two clock cycles, unless the destination is a segment register, in which case it takes 18. It takes nine times as long. Sheesh. A "near" subroutine call takes 7 cycles. A far subroutine call takes 34 - about five times slower. I don't have any newer references; perhaps the differences are smaller in the Pentium. Now for 32 bit mode: In 32 bit mode, suddenly the offsets are 32 bits. Now a "near" pointer (where the segment part is just assumed) is 32 bits and is large enough to address all the memory. Hey, the segment registers are still there and still fully functional; the 386 can support multiple 4G segments. However, ALL the 32 bit OS designers said "f*ck that" and gave applications only a single segment of enormous size. Suddenly life is good. Pointer math for large objects is of the simple one-instruction kind, as fast as any other math. Pointer loading is as fast as any other 32 bit load. No more segment override prefixes. In 16 bit mode it's architecturally impossible to have a stack larger than 64K, but in 32 bit mode it's as large as, well, every other segment (there's but one, remember?). Things are simpler and faster, and you can (almost) forget that the abomination of segments ever existed. On the other hand... Suppose you have a program that's small, simple and fast, it never needed more that 64K of memory and it doesn't count higher than 64 thousand. 32 bits won't make it faster, in fact it'll make it larger and slower. Sorry. Oh, let me just throw in the newest good reason why 32 bits is faster than 16 bits. The new Pentium Pro slows down on 16 bit code. It uses fascinating new technology, replete with all the latest buzzwords; x86 instructions are decomposed into nano-ops and scheduled out-of-order to multiple functional units. There's a little problem with it, however. It's fancy data paths are designed for carrying 32 bit values around. If forced to operate on pieces of whole registers, like it would in 16 bit mode, then it's registers can't rename and it's pipelines stall until it can make whole 32 bit results from 16 bit operations. -- Richard Krehbiel, Kastle Systems, Arlington VA USA rich@kastle.com (work) or richk@mnsinc.com (personal)